Search Result

Select

Imbalanced network traffic classification method based on improved forest rotation algorithm

DING Yaojun

Journal of Computer Applications 2015, 35 (12): 3348-3351. DOI: 10.11772/j.issn.1001-9081.2015.12.3348

Abstract （599）

PDF （611KB）（450）

Save

Aiming at the problem of not high accuracy of the unbalanced network traffic classification, on the basis of rotation forest algorithm, an improved rotation forest algorithm by combining the Bootstrap sampling of Bagging algorithm and the base classifier selection algorithm based on sorting of accuracy was proposed. Firstly, the subset was divided from the original training set according to the characteristics, the Bagging was used for sampling, and the coefficient matrix of principal components was computed by Principal Component Analysis (PCA). Then, features of subset were converted based on the original training set and coefficient matrix of principal components to generate new training subsets. In order to enhance the difference of training set and train base classifier of C4.5 by the training subset, the Bagging was used again for sampling subsets. Finally, the testing set was used to evaluate the base classifiers, and the classifiers were sorted and filtered by the overall classification accuracy.The classifiers with high accuracy were chosen to generate consistent classifier results. The imbalanced network traffic data set was chosen for the test experiment, and the precision and recall were used for evaluating the classifiers of C4.5, Bagging, rotation forest and the improved rotation forest. The time efficiency of the four algorithms were evaluated by the training time and testing time of models. The experimental results show that, the classification accuracy of the improved rotation forest algorithm is above 99.5% on the protocols of World Wide Web (WWW), Mail, Attack, Peer-to-Peer (P2P), and the recall rate is also higher than rotation forest, Bagging and C4.5. The proposed algorithm can be used for network intrusion forensics, maintaining network security and improving the quality of network service.

Reference | Related Articles | Metrics

Select

GOMDI: GPU OpenFlow massive data network analysis model

ZHANG Wei XIE Zhenglong DING Yaojun ZHANG Xiaoxiao

Journal of Computer Applications 2014, 34 (8): 2243-2247. DOI: 10.11772/j.issn.1001-9081.2014.08.2243

Abstract （462）

PDF （840KB）（398）

Save

OpenFlow enhances the Quality of Service (QoS) of traditional networks, but it has disadvantage that its network session identification efficiency is low and the network packet forwarding path is poor and so on. On the basis of the current study of the OpenFlow, GPU OpenFlow Massive Data Network Analysis (GOMDI) model was proposed by this paper, through integrating the biological sequence algorithm, GPU parallel computing algorithm and machine learning methods. The network session matching algorithm and path selection algorithm of GOMDI were designed. The experimental results show that the speedup of the GOMDI network session matching algorithm is over 300 higher than the CPU environment in real network, and the network packet loss rate of its path selection algorithm is lower than 5%, the network delay is less than 20ms. Thus, the GOMDI model can effectively improve network performance and meet the needs of the real-time processing for massive information in big data environment.

Reference | Related Articles | Metrics

Select

Internet traffic classification method based on selective clustering ensemble of mutual information

DING Yaojun CAI Wandong

Journal of Computer Applications 2013, 33 (01): 80-82. DOI: 10.3724/SP.J.1087.2013.00080

Abstract （1088）

PDF （602KB）（619）

Save

Because it is difficult to label Internet traffic and the generalization ability of single clustering algorithm is weak, a selective clustering ensemble method based on Mutual Information (MI) was proposed to improve the accuracy of traffic classification. In the method, the Normalized Mutual Information (NMI) between clustering results of K-means algorithm with different initial cluster number and the distribution of protocol labels of training set was computed first, and then a serial of K which were the initial cluster number of K-means algorithm based on NMI were selected. Finally, the consensus function based on Quadratic Mutual Information (QMI) was used to build the consensus partition, and the labels of clusters were labeled based on a semi-supervised method. The overall accuracies of clustering ensemble method and single clustering algorithm were compared over four testing sets, and the experimental results show that the overall accuracy of clustering ensemble method can achieve 90%. In the proposed method, a clustering ensemble model was used to classify Internet traffic, and the overall accuracy of traffic classification along with the stability of classification over different dataset got enhanced.

Reference | Related Articles | Metrics